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(54) Low bit rate speech encoder and decoder 



(57) Methods and systems for encoding a speech 
signal into a bit stream, and recreating the speech signal 
from the bit stream are disclosed. An analog-to-digital 
converter (20) forms a digital signal based upon an an- 
alog speech signal. A phoneme parser (22) parses the 
digital signal into at least one phoneme. A phoneme rec- 
ognizer (24) assigns a symbolic code to each phoneme 
based upon recognition of the phonemes from a prede- 
termined set. A read-only memory (34) contains a stand- 
ard waveform representation of each phoneme from the 
predetermined set. A difference processor (32) forms a 
difference signal between a user-spoken phoneme 
waveform and a corresponding waveform from the 
read-only memory (34). The difference signal is stored 
in a storage device (40). A multiplexer (30) provides a bit 
stream signal based upon the symbolic code and the dif- 
ference signal. A synchronizer (70) extracts the symbolic 
code and the difference signal from the bit stream. A pho- 
neme generator (76) forms the speech signal based 
upon the symbolic code and the difference signal. 
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Description 
Technical Field 



The present invention relates generally to methods 
and systems for speech signal processing, and more 
particularly, to methods and systems for encoding and 
decoding speech signals. 

Background of the Invention 

Speech compression systems are employed to re- 
duce the number of bits needed to transmit and store a 
digitally-sampled speech signal. As a result, a lower 
bandwidth communication channel can be employed to 
transmit a compressed speech signal in comparison to 
an uncompressed speech signal. Similarly, a reduced 
capacity of a storage device, which can comprise a mem- 
ory or a magnetic storage medium, is required for storing 
the compressed speech signal. A general speech com- 
pression system includes an encoder, which converts 
the speech signal into a compressed signal, and a de- 
coder, which recreates the speech signal based upon the 
compressed signal. 

In the design of the speech compression system, an 
objective is to reduce the number of bits needed to rep- 
resent the speech signal while preserving its message 
content and intelligibility. Current methods and systems 
for speech compression have achieved a reasonable 
quality of message preservation at a transmission bit rate 
of 4.8 kilobits per second. These methods and systems 
are based upon directly compressing a waveform repre- 
sentation of the speech signal. 

Summary of the Invention 

The need exists for a speech compression system 
which significantly reduces the number of bits needed to 
transmit and store a speech signal, and which simulta- 
neously preserves the message content of the speech 

signal. . 

It is thus an object of the present invention to signif- 
icantly reduce the bit rate needed to transmit a speech 
signal. 

Another object of the present invention ts to provide 
a speech encoder and corresponding speech decoder 
which allows a selectable personalization of an encoded 
speech signal. 

A further object of the present invention is to provide 
a symbolic encoding and decoding of a speech signal. 

In carrying out the above objects, the present inven- 
tion provides a system for encoding a speech signal into 
a bit stream. A phoneme parser parses the speech signal 
into at least one phoneme. A phoneme recognizer, cou- 
pled to the phoneme parser assigns a symbolic code to 
each of the at least one phoneme based upon recogni- 
tion of the at least one phoneme from a predetermined 
phoneme set. A difference processor forms a difference 



signal between a user-spoken phoneme waveform and 
a corresponding waveform from a standard waveform 
set. The bit stream is based upon the difference signal 
and the symbolic code of each of the at least one pho- 
5 neme. 

Further in carrying out the above objects, the 
present invention provides a system for recreating a 
speech signal from a bit stream representative of an en- 
coded speech signal. A synchronizer extracts at least 
w one symbolic code from the bit stream, wherein each of 
the at least one symbolic code is representative of a cor- 
responding phoneme from a predetermined phoneme 
set. The synchronizer further extracts at least one differ- 
ence signal representative of a difference between a first 
is phoneme waveform and a second phoneme waveform. 
A phoneme generator, which is coupled to the synchro- 
nizer, forms the speech signal by generating a corre- 
sponding phoneme waveform for each of the at least one 
symbolic code extracted by the synchronizer in depend- 
20 ence upon the at least one difference signal. 

Still further in carrying out the above objects, the 
present invention provides a method of encoding a 
speech signal into a bit stream. The speech signal is 
parsed into at least one phoneme. The at least one prio- 
ns neme is recognized from a predetermined phoneme set. 
A symbolic code is assigned to each of the at least one 
phoneme. A difference signal is formed between a us- 
er-spoken phoneme waveform and a corresponding 
phoneme waveform from a standard waveform set. The 
30 bit stream is formed based upon the difference signal 
and the symbolic code of each of the at least one pho- 
neme. 

Yet still further in carrying out the above objects, the 
present invention provides a method of recreating a 
35 speech signal from a bit stream representative of an en- 
coded speech signal. At least one symbolic code is ex- 
tracted from the bit stream, wherein each of the at least 
one symbolic code is representative of a corresponding 
phoneme from a predetermined phoneme set. At least 

40 one difference signal is extracted from the bit stream, 
wherein the at least one difference signal is representa- 
tive of a difference between a first phoneme waveform 
and a second phoneme waveform. The recreated 
speech signal is formed by generating a corresponding 

45 phoneme waveform for each of the at least one symbolic 
code in dependence upon the at least one difference sig- 
nal. 

These and other features, aspects, and advantages 
of the present invention wilt become better understood 
so with regard to the following description, appended 
claims, and accompanying drawings. 

Brief Description of the Drawings 



55 FIGURE 1 is a block diagram of an embodiment of 
an encoder in accordance with the present inven- 
tion: 
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FIGURE 2 is a flow chart of a method of encoding a 
speech signal; 

FIGURE 3 is a block diagram of an embodiment of 
an decoder in accordance with the present inven- 
tion; and 

FIGURE 4 is a flow chart of a method of decoding a 
speech signal. 

Best Modes for Carrying out the Invention 

In overcoming the disadvantages of previous sys- 
tems, the present invention provides an encoder/trans- 
mitter and a corresponding decoder/receiver which em- 
ploy phoneme recognition and coding. Phonemes rep- 
resent the basic unit of speech, i.e. the fundamental 
sounds, of which there are approximately forty in the 
English language. By determining the phonemes which 
were spoken by a user symbolically coding the pho- 
nemes for transmission, and generating an appropriate 
phoneme waveform in response to receiving the coded 
phonemes, the original speech can be recreated. Fur- 
ther, the decoder can include an adaptive section which 
personalizes the synthesized voice based upon a per- 
sonalization increment learned during a training mode of 
the encoder. 

An embodiment of a speech encoder in accordance 
with the present invention is illustrated by the block dia- 
gram in Figure 1 . The speech encoder provides a system 
for encoding a speech signal into a bit stream signal for 
transmission to a corresponding decoder. An analog 
speech signal is applied to an anatog-to-digital converter 
20. The analog-to-digital converter 20 digitizes the ana- 
log speech signal to form a digital speech signal. A pho- 
neme parser 22 is coupled to the analog-to-digital con- 
verter 20. The phoneme parser 22 identifies the time 
base for each phoneme contained within the digital 
speech signal, and parses the digital speech signal into 
at least one phoneme based upon the time base. 

The phoneme parser 22 is coupled to a phoneme 
recognizer 24 which recognizes the at least one pho- 
neme from a predetermined phoneme set, and assigns 
a symbolic code to each of the at least one phoneme. In 
a preferred embodiment for the English language, the 
phoneme recognizer 24 assigns a unique six-bit symbol- 
ic code to each of the approximately forty phonemes in 
the English language. It is noted that the number of bits 
employed in coding each phoneme in the English lan- 
guage is not limited to six. For example, eight-bit codes, 
capable of representing 256 different phonemes, can 
also be employed. One with ordinary skill in the art will 
recognize that the number of bits needed for coding the 
phonemes is dependent upon the number of phonemes 
in the language of interest. 

The symbolic code from the phoneme recognizer 24 
is applied to a variable length coder 26. The variable 
length coder 26 provides a variable length code of the 



symbolic code based upon the relative likelihood of the 
corresponding phoneme to be spoken. More specifically, 
phonemes which occur frequently in typical speech are 
coded with a shorter length codes, while phonemes 
5 which occur infrequently are coded with longer length 
codes. The variable length coder 26 is employed to re- 
duce the average number of bits needed to represent a 
typical speech signal. In a preferred embodiment, the 
variable length coder employs a Huffman coding 
10 scheme. The variable length coder 26 is coupled to a 
multiplexer 30 which formats the variable length code 
into a serial bit stream. 

The phoneme parser 22 is coupled to difference 
processor 32 which forms a difference signal between a 
is user-spoken phoneme waveform and a corresponding 
waveform from a standard phoneme waveform library. 
The standard phoneme waveform library is contained 
within a first electronic storage device 34, such as a 
read-only memory, coupled to the difference processor 
20 32. The first electronic storage device 34 contains a 
standard waveform representation of each phoneme 
from the predetermined phoneme set. 

The difference signal is compressed by a data com- 
pressor 36 coupled to the output of the difference proc- 
25 essor 32. A representation of the compressed difference 
signal is stored in a second electronic storage device 40. 
As a result, the second electronic storage device 40 con- 
tains a personal phoneme library for the user of the en- 
coder. The multiplexer 30 is coupled to the second elec- 
30 tronic storage device 40 so that the bit stream provided 
thereby is based upon both the symbolic code generated 
by the phoneme recognizer 24 and the representation of 
the difference signal. In a preferred embodiment, the 
multiplexer 30 formats a header based upon the person- 
35 al phoneme library upon an initiation of transmission. Af- 
ter transmitting any synchronization or initiation bits, if 
necessary, the header is transmitted followed by the cod- 
ed serial speech bit stream. 

The combination of the difference processor 32, the 
40 first electronic storage device 34, the data compressor 
36, and the second electronic storage device 40 forms 
a system which performs a personalization training of the 
encoder. Thus, in a predetermined training mode, the 
output of the phoneme parser 22 is compared to the 
45 standard phoneme waveform library, and a difference 
phoneme waveform, i.e. a delta phoneme waveform, is 
formed and compressed. The delta phoneme waveform 
is then stored in the personal phoneme library of the en- 
coder for later transmission. 
so In accordance with the present invention, an embod- 
iment of a method of encoding a speech signal into a bit 
stream signal is illustrated by the flow chart in Figure 2. 
If the speech signal is an analog speech signal, then a 
step of converting the analog speech signal into a digital 
55 speech signal is performed in block 50. A step of parsing 
the digital speech signal into at least one phoneme is 
performed in block 52. In block 54, a step of recognizing 
the at least one phoneme is performed. Block 56 per- 
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forms a step of assigning a symbolic code to each of the 
at least one phoneme. Blocks 60 and 62, which can be 
performed prior to blocks 52, 54, and 54, perform the 
steps of forming a difference signal between a user-spo- 
ken phoneme waveform and a corresponding phoneme 
waveform from a standard phoneme waveform set, and 
storing a representation of the difference signal. In block 
64, a step of multiplexing the symbolic code with the rep- 
resentation of the difference signal to form the bit stream 
signal is performed. 

In accordance with the present invention, an embod- 
iment of a decoder is illustrated by the block diagram in 
Figure 3. The decoder provides a system for recreating 
a speech signal from a bit stream, representative of an 
encoded speech signal, received from a corresponding 
encoder. The bit stream enters a synchronizer 70, which 
generates an internal clock signal in order to lock onto 
the bit stream. The synchronizer 70 extracts at least one 
difference signal representative of a difference between 
a user-spoken phoneme waveform and a corresponding 
phoneme waveform from a standard phoneme wave- 
form set. In a preferred embodiment, the at least one dif- 
ference signal is received within a header in the bit 
stream. The synchronizer 70 is coupled to a storage de- 
vice 72 which stores a representation of the at least one 
difference signal. In a preferred embodiment, the syn- 
chronizer sends the header to the storage device 72. As 
a result, the storage device 72, which can be embodied 
by a standard DRAM (dynamic random access memory), 
forms a guest personal phoneme library for the decoder 

The synchronizer 70 further extracts at least one 
symbolic code from the bit stream, wherein each of the 
at least one symbolic code is representative of a corre- 
sponding phoneme from a predetermined phoneme set. 
In a preferred embodiment, the synchronizer 70 blocks 
the bit stream into variable length blocks, each repre- 
senting a phoneme. The at least one symbolic code is 
applied to a phoneme generator 74, which is coupled to 
the synchronizer 70. The phoneme generator 74 in- 
cludes a standard phoneme waveform generator 76 
which generates a corresponding phoneme waveform 
from the standard waveform set for each of the at least 
one symbolic code. The phoneme generator 74 can fur- 
ther include a look-up table which converts the variable 
length blocks to fixed length blocks to address the pho- 
neme waveform generator 76. In a preferred embodi- 
ment, each of the blocks selects a particular phoneme 
from the standard waveform set. As a result, a recreated 
speech signal, typically represented digitally, is formed. 

The phoneme generator 74 is further coupled to the 
storage device 72. The storage device 72 provides the 
at least one difference signal to the phoneme generator 
so that the recreated speech signal can be modified in 
dependence thereupon. More specifically, the phoneme 
generator 74 includes a summing element 80 which 
combines the phoneme waveform from the standard 
waveform set with the difference signal in order to rec- 
reate the voice of the original speaker. The output of the 
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phoneme generator 74 is applied to a digital-to-analog 
converter 82 in order to form an analog recreated speech 
signal. 

In accordance with the present invention, an embod- 

5 iment of a method of recreating a speech signal from a 
bit stream representative of an encoded speech signal 
is illustrated by the flow chart in Figure 4. A step of ex- 
tracting at least one difference signal representative of 
a difference between a user-spoken phoneme waveform 

io and a corresponding phoneme waveform from a stand- 
ard phoneme waveform set is performed in block 90. 
Block 92 performs a step of storing a representation of 
the at least one difference signal. In block 94, a step of 
extracting at least one symbolic code from the bit stream 

is is performed, wherein each of the at least one symbolic 
code is representative of a corresponding phoneme from 
a predetermined phoneme set. A step of forming a digital 
recreated speech signal is performed in block 96. More 
specifically, a corresponding phoneme waveform from 

20 the standard phoneme waveform set is generated for 
each of the at least one symbolic code. Block 98 per- 
forms a step of modifying the digital recreated speech 
signal in dependence upon the at least one difference 
signal. In block 100, an optional step of converting the 

25 digital recreated speech signal into an analog recreated 
speech signal is performed. 

The above-described embodiments of the present 
invention have many advantages. By recognizing and 
symbolically encoding phonemes, the required bit rate 

30 for transmitting a speech signal is significantly reduced. 
For example, if an average phoneme lasts about 1 00 mil- 
liseconds, the encoded speech signal using six bits per 
phoneme can be transmitted at a bit rate of 60 bits per 
second. 

35 Another advantage of the present invention is the 
selectable personalization of the recreated speech 
which results from employing a personal phoneme li- 
brary. Embodiments can include a default option which 
produces a purely synthetic voice in order to attain the 

40 lowest bit rate for operation. Similarly, a higher quality of 
speech can be produced in return for a higher bit rate of 
operation. As a result, the use of the personal phoneme 
library lends itself to adaptability. By determining the ca- 
pacity of the decoder and a communication link which 

45 couples the encoder and decoder, the encoder can adapt 
to this capacity by sending out some of the personaliza- 
tion library in successive headers. 

A further advantage of the present invention is that 
modern speech recognizers, which are capable of per- 

50 forming steps of phoneme parsing and statistical analy- 
sis of combinations of phonemes in forming words, can 
be employed in its implementation. 

While the best mode for carrying out the invention 
has been described in detail, those familiar with the art 

ss to which this invention relates will recognize various al- 
ternative designs and embodiments for practicing the in- 
vention as defined by the following claims. 




4 



EP 0 706 172 A1 




8 



Claims 



A system for encoding a speech signal into a bit 
stream, the system comprising: 

a phoneme parser (22) which parses the 
speech signal into at least one phoneme; 
a phoneme recognizer (24), coupled to the pho- 
neme parser (22), which assigns a symbolic 
code to each of the at least one phoneme based 
upon recognition of the at least one phoneme 
from a predetermined phoneme set: and 
a difference processor (32), coupled to the pho- 
neme parser, which forms a difference signal 
between a user-spoken phoneme waveform 
and a corresponding phoneme waveform from 
a standard waveform set; 
wherein the bit stream is based upon the differ- 
ence signal and the symbolic code of each of 
the at least one phoneme. 

The system of claim 1 further 
comprising a first storage device (34) which contains 
a standard waveform representation of each pho- 
neme from the predetermined phoneme set, the first 
storage device (34) coupled to the difference proc- 
essor (32) to provide the corresponding phoneme 
waveform thereto. 
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ken phoneme waveform and a corresponding 
phoneme waveform from a standard waveform 
set; and 

forming the bit stream based upon the differ- 
ence signal and the symbolic code of each of 
the at least one phoneme. 

7. The method of claim 6 further 

comprising the step of storing a standard waveform 
representation of each phoneme from the predeter- 
mined phoneme set. 

8. The method of claim 6 further 

comprising the step of storing a representation of the 
difference signal. 

9. The method of claim 8 wherein the step of forming 
the bit stream includes the step of multiplexing the 
symbolic code with the representation of the differ- 
ence signal. 

10. The method of claim 9 further comprising the step 
of variable length coding the symbolic code. 
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3. The system of claim 1 further 

comprising a second storage device (40), coupled 
to the difference processor (32), in which a repre- 
sentation of the difference signal is stored. 

4. The system of claim 3 further 

comprising a multiplexer (30), coupled to the pho- 
neme recognizer (24) and to the second storage 
device (40), which provides the bit stream based 
upon the symbolic code and the representation of 
the difference signal. 



5. The system of claim 4 further 

comprising a variable length coder (26), interposed 
between the phoneme recognizer (24) and the mul- 
tiplexer (30), which provides a variable length code 
of the symbolic code for application the multiplexer 
(30). 

6. A method of encoding a speech signal into a bit 
stream, the method comprising the steps of: 

parsing the speech signal into at least one pho- 
neme: 

recognizing the at least one phoneme from a 
predetermined phoneme set; 
assigning a symbolic code to each of the at least 
one phoneme; 

forming a difference signal between a user-spo- 
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